GWASLab
  • Home
  • Tutorial for gwaslab
  • Sumstats manipulation
    • SumstatsObject
    • StatusCode
    • Standardization&normalization
    • QC&Filtering
    • Harmonization
    • Liftover
    • Data conversion
  • Visualization
    • Manhattan & QQ plot
    • Regional Plot
    • Brisbane Plot
    • Miami Plot
    • Scatter: effect size comparison
    • Scatter: allele frequency comparison
    • Heatmap: genetic correlation
    • Forest Plot
    • Gallery
  • Utilities
    • Extract Lead Variants
    • Extract Novel Variants
    • Format & Save
    • Load LDSC log
    • Convert Heritability
    • Infer Genome Build
  • References
    • Reference data
    • Common data
    • Update logs
  • Examples
    • Standardization and QC example
      • Standardization Workflow
      • QC & Filtering
    • Harmonization example
      • Workflow
      • Liftover
    • Formatting and saving
    • Utility example
      • Data conversion
      • Get novel variants
    • Visualization example
      • Manhattan and Q-Q plot
      • Miami plot
      • Regional plot
      • Brisbane plot
  • Previous
  • Next
  • GitHub
  • Regional plot
  • load gwaslab
  • download sample data
  • load sumstats into gwaslab.Sumstats
  • a quick mmq plot
  • filter and basic check
  • check my available reference data

Regional plot¶

load gwaslab¶

In [1]:
Copied!
import sys
sys.path.insert(0,"/Users/he/work/gwaslab/src")
import gwaslab as gl
import sys sys.path.insert(0,"/Users/he/work/gwaslab/src") import gwaslab as gl

download sample data¶

In [2]:
Copied!
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
--2022-10-20 22:22:11--  http://jenger.riken.jp/14/
Resolving jenger.riken.jp (jenger.riken.jp)... 134.160.84.25
Connecting to jenger.riken.jp (jenger.riken.jp)|134.160.84.25|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 274187574 (261M) [text/plain]
Saving to: ‘t2d_bbj.txt.gz’

t2d_bbj.txt.gz      100%[===================>] 261.49M  10.6MB/s    in 25s     

2022-10-20 22:22:36 (10.6 MB/s) - ‘t2d_bbj.txt.gz’ saved [274187574/274187574]

load sumstats into gwaslab.Sumstats¶

In [2]:
Copied!
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
             snpid="SNP",
             chrom="CHR",
             pos="POS",
             ea="ALT",
             nea="REF",
             neaf="Frq",
             beta="BETA",
             se="SE",
             p="P",
             direction="Dir",
             n="N")
mysumstats = gl.Sumstats("t2d_bbj.txt.gz", snpid="SNP", chrom="CHR", pos="POS", ea="ALT", nea="REF", neaf="Frq", beta="BETA", se="SE", p="P", direction="Dir", n="N")
Fri Nov  4 12:13:56 2022 Start to initiate from file :t2d_bbj.txt.gz
Fri Nov  4 12:14:26 2022  -Reading columns          : SNP,BETA,SE,CHR,REF,ALT,Frq,P,N,POS,Dir
Fri Nov  4 12:14:26 2022  -Renaming columns to      : SNPID,BETA,SE,CHR,NEA,EA,EAF,P,N,POS,DIRECTION
Fri Nov  4 12:14:26 2022  -Current dataframe shape  : Rows  12557761  x  11  Columns
Fri Nov  4 12:14:32 2022  -Initiating a status column ...
Fri Nov  4 12:14:39 2022  -NEAF is specified...
Fri Nov  4 12:14:39 2022  -Checking if 0<= NEAF <=1 ...
Fri Nov  4 12:14:43 2022  -Converted NEAF to EAF.
Fri Nov  4 12:14:43 2022  -Removed 0 variants with bad NEAF.
Fri Nov  4 12:14:43 2022  -Reordering columns to    : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS
Fri Nov  4 12:14:47 2022 Finished loading data successfully!

a quick mmq plot¶

In [3]:
Copied!
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
mysumstats.plot_mqq(skip=3,cut=20, mode="mqq",stratified=True)
Fri Nov  4 12:14:47 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Nov  4 12:14:47 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Nov  4 12:14:47 2022  -Raw input contains 12557761 variants...
Fri Nov  4 12:14:47 2022  -Plot layout mode is : mqq
Fri Nov  4 12:14:56 2022 Finished loading specified columns from the sumstats.
Fri Nov  4 12:14:56 2022 Start conversion and sanity check:
Fri Nov  4 12:14:59 2022  -Removed 328791 variants with nan in CHR or POS column ...
Fri Nov  4 12:15:00 2022  -Removed 0 variants with nan in EAF column ...
Fri Nov  4 12:15:01 2022  -Removed 0 variants with nan in P column ...
Fri Nov  4 12:15:01 2022  -P values are being converted to -log10(P)...
Fri Nov  4 12:15:01 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Nov  4 12:15:03 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Nov  4 12:15:07 2022  -Maximum -log10(P) values is 167.58838029403677 .
Fri Nov  4 12:15:07 2022  -Minus log10(P) values above 20 will be shrunk with a shrinkage factor of 10...
Fri Nov  4 12:15:07 2022 Finished data conversion and sanity check.
Fri Nov  4 12:15:08 2022 Start to create manhattan plot with 87571 variants:
Fri Nov  4 12:15:09 2022  -Found 84 significant variants with a sliding window size of 500 kb...
Fri Nov  4 12:15:09 2022 Finished creating Manhattan plot successfully!
Fri Nov  4 12:15:09 2022  -Skip annotating
Fri Nov  4 12:15:09 2022 Start to create QQ plot with 87571 variants:
Fri Nov  4 12:15:11 2022  -Calculating GC using P : 1.2139048028292598
Fri Nov  4 12:15:11 2022 Finished creating QQ plot successfully!
Out[3]:
(<Figure size 3000x1000 with 2 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)

filter and basic check¶

In [5]:
Copied!
mysumstats.filter_in(eq={"CHR":"7"},inplace=True)
mysumstats.basic_check()
mysumstats.filter_in(eq={"CHR":"7"},inplace=True) mysumstats.basic_check()
Fri Nov  4 12:15:32 2022 Start filtering values:
Fri Nov  4 12:15:33 2022  -Keeping 707780 variants with CHR = 7 ...
Fri Nov  4 12:15:34 2022 Finished filtering values.
Fri Nov  4 12:15:34 2022 Start to check IDs...
Fri Nov  4 12:15:34 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:34 2022  -Checking if SNPID is chr:pos:ref:alt...(separator: - ,: , _)
Fri Nov  4 12:15:36 2022 Finished checking IDs successfully!
Fri Nov  4 12:15:36 2022 Start to fix chromosome notation...
Fri Nov  4 12:15:36 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:39 2022  -All CHR are already fixed...
Fri Nov  4 12:15:41 2022 Finished fixing chromosome notation successfully!
Fri Nov  4 12:15:41 2022 Start to fix basepair positions...
Fri Nov  4 12:15:41 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:43 2022  -Position upper_bound is: 250,000,000
Fri Nov  4 12:15:46 2022  -Remove outliers: 0
Fri Nov  4 12:15:46 2022  -Converted all position to datatype Int64.
Fri Nov  4 12:15:46 2022 Finished fixing basepair position successfully!
Fri Nov  4 12:15:46 2022 Start to fix alleles...
Fri Nov  4 12:15:46 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:46 2022  -Detected 0 variants with alleles that contain bases other than A/C/T/G .
Fri Nov  4 12:15:46 2022  -Converted all bases to string datatype and UPPERCASE.
Fri Nov  4 12:15:48 2022 Finished fixing allele successfully!
Fri Nov  4 12:15:48 2022 Start sanity check for statistics ...
Fri Nov  4 12:15:48 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:48 2022  -Checking if  0 <=N<= inf  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad N.
Fri Nov  4 12:15:49 2022  -Checking if  0 <=EAF<= 1  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad EAF.
Fri Nov  4 12:15:49 2022  -Checking if  5 <=MAC<= inf  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad MAC.
Fri Nov  4 12:15:49 2022  -Checking if  5e-300 <= P <= 1  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad P.
Fri Nov  4 12:15:49 2022  -Checking if  -10 <BETA)< 10  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad BETA.
Fri Nov  4 12:15:49 2022  -Checking if  0 <SE< inf  ...
Fri Nov  4 12:15:49 2022  -Removed 0 variants with bad SE.
Fri Nov  4 12:15:49 2022  -Checking STATUS...
Fri Nov  4 12:15:50 2022  -Coverting STAUTUS to category.
Fri Nov  4 12:15:50 2022  -Removed 0 variants with bad statistics in total.
Fri Nov  4 12:15:50 2022 Finished sanity check successfully!
Fri Nov  4 12:15:50 2022 Start to normalize variants...
Fri Nov  4 12:15:50 2022  -Current Dataframe shape : 707780  x  12
Fri Nov  4 12:15:51 2022  -No available variants to normalize..
Fri Nov  4 12:15:51 2022 Finished normalizing variants successfully!
In [6]:
Copied!
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl",build="19")
Fri Nov  4 12:15:51 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Nov  4 12:15:51 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Nov  4 12:15:51 2022  -Raw input contains 707780 variants...
Fri Nov  4 12:15:51 2022  -Plot layout mode is : m
Fri Nov  4 12:15:51 2022 Finished loading specified columns from the sumstats.
Fri Nov  4 12:15:51 2022 Start conversion and sanity check:
Fri Nov  4 12:15:51 2022  -Removed 0 variants with nan in CHR or POS column ...
Fri Nov  4 12:15:51 2022  -Removed 0 variants with nan in P column ...
Fri Nov  4 12:15:51 2022  -P values are being converted to -log10(P)...
Fri Nov  4 12:15:51 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Nov  4 12:15:51 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Nov  4 12:15:51 2022  -Maximum -log10(P) values is 73.38711023071251 .
Fri Nov  4 12:15:51 2022 Finished data conversion and sanity check.
Fri Nov  4 12:15:53 2022 Start to create manhattan plot with 707780 variants:
Fri Nov  4 12:15:56 2022  -Found 7 significant variants with a sliding window size of 500 kb...
Fri Nov  4 12:15:56 2022 Start to annotate variants with nearest gene name(s)...
Fri Nov  4 12:15:56 2022  -Assigning Gene name using built-in Ensembl Release 75  (hg19)
Fri Nov  4 12:15:56 2022 Finished annotating variants with nearest gene name(s) successfully!
Fri Nov  4 12:15:56 2022 Finished creating Manhattan plot successfully!
Fri Nov  4 12:15:56 2022  -Annotating using column GENENAME...
Out[6]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [7]:
Copied!
mysumstats.get_lead()
mysumstats.get_lead()
Fri Nov  4 12:16:01 2022 Start to extract lead variants...
Fri Nov  4 12:16:01 2022  -Processing 707780 variants...
Fri Nov  4 12:16:01 2022  -Significance threshold : 5e-08
Fri Nov  4 12:16:01 2022  -Sliding window size: 500  kb
Fri Nov  4 12:16:01 2022  -Found 1077 significant variants in total...
Fri Nov  4 12:16:01 2022  -Identified 7 lead variants!
Fri Nov  4 12:16:01 2022 Finished extracting lead variants successfully!
Out[7]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099
In [8]:
Copied!
mysumstats.get_lead(anno=True)
mysumstats.get_lead(anno=True)
Fri Nov  4 12:16:01 2022 Start to extract lead variants...
Fri Nov  4 12:16:01 2022  -Processing 707780 variants...
Fri Nov  4 12:16:01 2022  -Significance threshold : 5e-08
Fri Nov  4 12:16:01 2022  -Sliding window size: 500  kb
Fri Nov  4 12:16:01 2022  -Found 1077 significant variants in total...
Fri Nov  4 12:16:02 2022  -Identified 7 lead variants!
Fri Nov  4 12:16:02 2022 Start to annotate variants with nearest gene name(s)...
Fri Nov  4 12:16:02 2022  -Assigning Gene name using built-in Ensembl Release 75  (hg19)
Fri Nov  4 12:16:02 2022 Finished annotating variants with nearest gene name(s) successfully!
Fri Nov  4 12:16:02 2022 Finished extracting lead variants successfully!
Out[8]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS LOCATION GENE
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099 42154 ETV1
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099 0 DGKB
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099 3606 MYL7
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099 0 AUTS2
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099 0 PAX4
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099 0 CPA1
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099 0 UBE3C
In [9]:
Copied!
mysumstats.plot_mqq(region=(7,156538803,157538803))
mysumstats.plot_mqq(region=(7,156538803,157538803))
Fri Nov  4 12:16:02 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Nov  4 12:16:02 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Nov  4 12:16:02 2022  -Raw input contains 707780 variants...
Fri Nov  4 12:16:02 2022  -Plot layout mode is : mqq
Fri Nov  4 12:16:02 2022  -Region to plot : chr7:156538803-157538803.
Fri Nov  4 12:16:02 2022  -Extract SNPs in region : chr7:156538803-157538803...
Fri Nov  4 12:16:04 2022  -Extract SNPs in specified regions: 5831
Fri Nov  4 12:16:04 2022 Finished loading specified columns from the sumstats.
Fri Nov  4 12:16:04 2022 Start conversion and sanity check:
Fri Nov  4 12:16:04 2022  -Removed 0 variants with nan in CHR or POS column ...
Fri Nov  4 12:16:04 2022  -Removed 0 variants with nan in P column ...
Fri Nov  4 12:16:04 2022  -P values are being converted to -log10(P)...
Fri Nov  4 12:16:04 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Nov  4 12:16:04 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Nov  4 12:16:05 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Nov  4 12:16:05 2022 Finished data conversion and sanity check.
Fri Nov  4 12:16:05 2022 Start to create manhattan plot with 5831 variants:
Fri Nov  4 12:16:05 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Nov  4 12:16:05 2022 Finished creating Manhattan plot successfully!
Fri Nov  4 12:16:05 2022  -Skip annotating
Fri Nov  4 12:16:05 2022 Start to create QQ plot with 5831 variants:
Fri Nov  4 12:16:05 2022  -Calculating GC using P : 1.7191388184129959
Fri Nov  4 12:16:05 2022 Finished creating QQ plot successfully!
Out[9]:
(<Figure size 3000x1000 with 3 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [10]:
Copied!
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
mysumstats.plot_mqq(mode="r", region=(7,156538803,157538803),region_grid=True, gtf_path="ensembl")
Fri Nov  4 12:16:06 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Nov  4 12:16:06 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Nov  4 12:16:06 2022  -Raw input contains 707780 variants...
Fri Nov  4 12:16:06 2022  -Plot layout mode is : r
Fri Nov  4 12:16:06 2022  -Region to plot : chr7:156538803-157538803.
Fri Nov  4 12:16:06 2022  -Extract SNPs in region : chr7:156538803-157538803...
Fri Nov  4 12:16:08 2022  -Extract SNPs in specified regions: 5831
Fri Nov  4 12:16:08 2022 Finished loading specified columns from the sumstats.
Fri Nov  4 12:16:08 2022 Start conversion and sanity check:
Fri Nov  4 12:16:08 2022  -Removed 0 variants with nan in CHR or POS column ...
Fri Nov  4 12:16:08 2022  -Removed 0 variants with nan in P column ...
Fri Nov  4 12:16:08 2022  -P values are being converted to -log10(P)...
Fri Nov  4 12:16:08 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Nov  4 12:16:08 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Nov  4 12:16:09 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Nov  4 12:16:09 2022 Finished data conversion and sanity check.
Fri Nov  4 12:16:09 2022 Start to create manhattan plot with 5831 variants:
Fri Nov  4 12:16:09 2022  -Loading gtf files from:ensembl
Fri Nov  4 12:16:27 2022  -plotting gene track..
Fri Nov  4 12:16:27 2022  -Finished plotting gene track..
Fri Nov  4 12:16:27 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Nov  4 12:16:27 2022 Finished creating Manhattan plot successfully!
Fri Nov  4 12:16:27 2022  -Skip annotating
Out[10]:
(<Figure size 3000x2000 with 3 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [11]:
Copied!
lead_variants = mysumstats.get_lead()
lead_variants
lead_variants = mysumstats.get_lead() lead_variants
Fri Nov  4 12:16:28 2022 Start to extract lead variants...
Fri Nov  4 12:16:28 2022  -Processing 707780 variants...
Fri Nov  4 12:16:28 2022  -Significance threshold : 5e-08
Fri Nov  4 12:16:28 2022  -Sliding window size: 500  kb
Fri Nov  4 12:16:29 2022  -Found 1077 significant variants in total...
Fri Nov  4 12:16:29 2022  -Identified 7 lead variants!
Fri Nov  4 12:16:29 2022 Finished extracting lead variants successfully!
Out[11]:
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS
5474489 7:13888699_G_C 7 13888699 G C 0.5680 0.0562 0.0094 2.507000e-09 191764 ++++ 9960099
5480067 7:14898282_C_T 7 14898282 C T 0.6012 0.0617 0.0088 2.336000e-12 191764 ++++ 9960099
5626346 7:44174857_T_G 7 44174857 G T 0.5985 -0.0640 0.0093 5.325000e-12 191764 ---- 9960099
5732279 7:69406661_A_T 7 69406661 T A 0.1981 -0.0900 0.0111 4.871000e-16 191764 ---- 9960099
5965364 7:127253550_C_T 7 127253550 C T 0.9081 0.2761 0.0152 4.101000e-74 191764 ++++ 9960099
5976830 7:130025713_G_A 7 130025713 G A 0.9530 -0.1365 0.0230 3.068000e-09 191764 ---- 9960099
6092347 7:157038803_A_G 7 157038803 G A 0.4626 -0.0502 0.0088 1.127000e-08 191764 ---- 9960099
In [8]:
Copied!
flank = 100000
for index,row in lead_variants.iterrows():
        mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300})
                        #vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
flank = 100000 for index,row in lead_variants.iterrows(): mysumstats.plot_mqq(mode="r",region=(row["CHR"],row["POS"]-flank,row["POS"]+flank),region_grid=True,mqqratio=2,taf=[4,0,0.95,1,1],figargs={"figsize":(15,5),"dpi":300}) #vcf_path="/home/yunye/mydata/d_disk/eas_1kg_af/EAS.chr7.split_norm_af.vcf.gz")
Fri Oct 21 01:20:36 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:36 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:36 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:36 2022  -Plot layout mode is : r
Fri Oct 21 01:20:36 2022  -Region to plot : chr7:13788699-13988699.
Fri Oct 21 01:20:36 2022  -Extract SNPs in region : chr7:13788699-13988699...
Fri Oct 21 01:20:37 2022  -Extract SNPs in specified regions: 1225
Fri Oct 21 01:20:37 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:37 2022 Start conversion and sanity check:
Fri Oct 21 01:20:37 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:37 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:37 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:37 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:37 2022  -Maximum -log10(P) values is 8.600845666041783 .
Fri Oct 21 01:20:37 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:37 2022 Start to create manhattan plot with 1224 variants:
Fri Oct 21 01:20:37 2022  -Loading gtf files from:defualt
Fri Oct 21 01:20:46 2022  -plotting gene track..
Fri Oct 21 01:20:46 2022  -Finished plotting gene track..
Fri Oct 21 01:20:46 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:20:46 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:20:46 2022  -Skip annotating
Fri Oct 21 01:20:46 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:46 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:46 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:46 2022  -Plot layout mode is : r
Fri Oct 21 01:20:46 2022  -Region to plot : chr7:14798282-14998282.
Fri Oct 21 01:20:46 2022  -Extract SNPs in region : chr7:14798282-14998282...
Fri Oct 21 01:20:47 2022  -Extract SNPs in specified regions: 993
Fri Oct 21 01:20:47 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:47 2022 Start conversion and sanity check:
Fri Oct 21 01:20:47 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:47 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:47 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:47 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:47 2022  -Maximum -log10(P) values is 11.631527161559639 .
Fri Oct 21 01:20:47 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:47 2022 Start to create manhattan plot with 993 variants:
Fri Oct 21 01:20:47 2022  -Loading gtf files from:defualt
Fri Oct 21 01:20:56 2022  -plotting gene track..
Fri Oct 21 01:20:56 2022  -Finished plotting gene track..
Fri Oct 21 01:20:56 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:20:56 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:20:56 2022  -Skip annotating
Fri Oct 21 01:20:56 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:20:56 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:20:56 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:20:56 2022  -Plot layout mode is : r
Fri Oct 21 01:20:56 2022  -Region to plot : chr7:44074857-44274857.
Fri Oct 21 01:20:56 2022  -Extract SNPs in region : chr7:44074857-44274857...
Fri Oct 21 01:20:58 2022  -Extract SNPs in specified regions: 929
Fri Oct 21 01:20:58 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:20:58 2022 Start conversion and sanity check:
Fri Oct 21 01:20:58 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:20:58 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:20:58 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:20:58 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:20:58 2022  -Maximum -log10(P) values is 11.273680387889225 .
Fri Oct 21 01:20:58 2022 Finished data conversion and sanity check.
Fri Oct 21 01:20:58 2022 Start to create manhattan plot with 929 variants:
Fri Oct 21 01:20:58 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:07 2022  -plotting gene track..
Fri Oct 21 01:21:07 2022  -Finished plotting gene track..
Fri Oct 21 01:21:07 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:07 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:07 2022  -Skip annotating
Fri Oct 21 01:21:07 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:07 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:07 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:07 2022  -Plot layout mode is : r
Fri Oct 21 01:21:07 2022  -Region to plot : chr7:69306661-69506661.
Fri Oct 21 01:21:07 2022  -Extract SNPs in region : chr7:69306661-69506661...
Fri Oct 21 01:21:09 2022  -Extract SNPs in specified regions: 501
Fri Oct 21 01:21:09 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:09 2022 Start conversion and sanity check:
Fri Oct 21 01:21:09 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:09 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:09 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:09 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:09 2022  -Maximum -log10(P) values is 15.31238187042823 .
Fri Oct 21 01:21:09 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:09 2022 Start to create manhattan plot with 501 variants:
Fri Oct 21 01:21:09 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:18 2022  -plotting gene track..
Fri Oct 21 01:21:18 2022  -Finished plotting gene track..
Fri Oct 21 01:21:18 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:18 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:18 2022  -Skip annotating
Fri Oct 21 01:21:18 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:18 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:18 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:18 2022  -Plot layout mode is : r
Fri Oct 21 01:21:18 2022  -Region to plot : chr7:127153550-127353550.
Fri Oct 21 01:21:18 2022  -Extract SNPs in region : chr7:127153550-127353550...
Fri Oct 21 01:21:19 2022  -Extract SNPs in specified regions: 713
Fri Oct 21 01:21:19 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:19 2022 Start conversion and sanity check:
Fri Oct 21 01:21:19 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:19 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:19 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:19 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:19 2022  -Maximum -log10(P) values is 73.38711023071251 .
Fri Oct 21 01:21:19 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:19 2022 Start to create manhattan plot with 713 variants:
Fri Oct 21 01:21:19 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:28 2022  -plotting gene track..
Fri Oct 21 01:21:28 2022  -Finished plotting gene track..
Fri Oct 21 01:21:28 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:28 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:28 2022  -Skip annotating
Fri Oct 21 01:21:28 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:28 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:28 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:28 2022  -Plot layout mode is : r
Fri Oct 21 01:21:28 2022  -Region to plot : chr7:129925713-130125713.
Fri Oct 21 01:21:28 2022  -Extract SNPs in region : chr7:129925713-130125713...
Fri Oct 21 01:21:30 2022  -Extract SNPs in specified regions: 808
Fri Oct 21 01:21:30 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:30 2022 Start conversion and sanity check:
Fri Oct 21 01:21:30 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:30 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:30 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:30 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:30 2022  -Maximum -log10(P) values is 8.513144644723056 .
Fri Oct 21 01:21:30 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:30 2022 Start to create manhattan plot with 808 variants:
Fri Oct 21 01:21:30 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:39 2022  -plotting gene track..
Fri Oct 21 01:21:39 2022  -Finished plotting gene track..
Fri Oct 21 01:21:39 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:39 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:39 2022  -Skip annotating
Fri Oct 21 01:21:39 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Oct 21 01:21:39 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Oct 21 01:21:39 2022  -Raw input contains 707780 variants...
Fri Oct 21 01:21:39 2022  -Plot layout mode is : r
Fri Oct 21 01:21:39 2022  -Region to plot : chr7:156938803-157138803.
Fri Oct 21 01:21:39 2022  -Extract SNPs in region : chr7:156938803-157138803...
Fri Oct 21 01:21:40 2022  -Extract SNPs in specified regions: 1121
Fri Oct 21 01:21:40 2022 Finished loading specified columns from the sumstats.
Fri Oct 21 01:21:40 2022 Start conversion and sanity check:
Fri Oct 21 01:21:40 2022  -Removed 0 variants with nan in P column ...
Fri Oct 21 01:21:40 2022  -P values are being converted to -log10(P)...
Fri Oct 21 01:21:40 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Oct 21 01:21:40 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Oct 21 01:21:40 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Oct 21 01:21:40 2022 Finished data conversion and sanity check.
Fri Oct 21 01:21:40 2022 Start to create manhattan plot with 1121 variants:
Fri Oct 21 01:21:41 2022  -Loading gtf files from:defualt
Fri Oct 21 01:21:50 2022  -plotting gene track..
Fri Oct 21 01:21:50 2022  -Finished plotting gene track..
Fri Oct 21 01:21:50 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Oct 21 01:21:50 2022 Finished creating Manhattan plot successfully!
Fri Oct 21 01:21:50 2022  -Skip annotating

check my available reference data¶

In [12]:
Copied!
gl.check_downloaded_ref()
gl.check_downloaded_ref()
Fri Nov  4 12:16:29 2022 Start to check downloaded reference files...
Fri Nov  4 12:16:29 2022 1kg_eas_hg19  :  /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
Fri Nov  4 12:16:29 2022 1kg_eas_hg19_tbi  :  /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz.tbi
Fri Nov  4 12:16:29 2022 testlink  :  /Users/he/work/gwaslab/src/gwaslab/data/EAS.chr22.split_norm_af.1kgp3v5.vcf.gz
Out[12]:
{'1kg_eas_hg19': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz',
 '1kg_eas_hg19_tbi': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz.tbi',
 'testlink': '/Users/he/work/gwaslab/src/gwaslab/data/EAS.chr22.split_norm_af.1kgp3v5.vcf.gz'}
In [18]:
Copied!
gl.get_path("1kg_eas_hg19")
gl.get_path("1kg_eas_hg19")
Out[18]:
'/Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz'
In [17]:
Copied!
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True,
                    vcf_path=gl.get_path("1kg_eas_hg19"),taf=[4,0,0.95,1,1])
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803),region_grid=True, vcf_path=gl.get_path("1kg_eas_hg19"),taf=[4,0,0.95,1,1])
Fri Nov  4 12:32:36 2022 Start to plot manhattan/qq plot with the following basic settings:
Fri Nov  4 12:32:36 2022  -Genome-wide significance level is set to 5e-08 ...
Fri Nov  4 12:32:36 2022  -Raw input contains 707780 variants...
Fri Nov  4 12:32:36 2022  -Plot layout mode is : r
Fri Nov  4 12:32:36 2022  -Region to plot : chr7:156538803-157538803.
Fri Nov  4 12:32:36 2022  -Extract SNPs in region : chr7:156538803-157538803...
Fri Nov  4 12:32:38 2022  -Extract SNPs in specified regions: 5831
Fri Nov  4 12:32:38 2022 Finished loading specified columns from the sumstats.
Fri Nov  4 12:32:38 2022 Start conversion and sanity check:
Fri Nov  4 12:32:38 2022  -Removed 0 variants with nan in CHR or POS column ...
Fri Nov  4 12:32:38 2022  -Removed 0 variants with nan in P column ...
Fri Nov  4 12:32:38 2022  -P values are being converted to -log10(P)...
Fri Nov  4 12:32:38 2022  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
Fri Nov  4 12:32:38 2022  -Sanity check: 0 na/inf/-inf variants will be removed...
Fri Nov  4 12:32:38 2022  -Maximum -log10(P) values is 7.948076083953893 .
Fri Nov  4 12:32:38 2022 Finished data conversion and sanity check.
Fri Nov  4 12:32:38 2022 Start to load reference genotype...
Fri Nov  4 12:32:38 2022  -reference vcf path : /Users/he/work/gwaslab/src/gwaslab/data/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
Fri Nov  4 12:32:40 2022  -Retrieving index...
16384
Fri Nov  4 12:32:40 2022  -Calculating Rsq...
Fri Nov  4 12:32:40 2022 Finished loading reference genotype successfully!
Fri Nov  4 12:32:40 2022 Start to create manhattan plot with 5831 variants:
Fri Nov  4 12:32:41 2022  -Loading gtf files from:defualt
Fri Nov  4 12:32:52 2022  -plotting gene track..
Fri Nov  4 12:32:53 2022  -Finished plotting gene track..
Fri Nov  4 12:32:53 2022  -Found 1 significant variants with a sliding window size of 500 kb...
Fri Nov  4 12:32:53 2022 Finished creating Manhattan plot successfully!
Fri Nov  4 12:32:53 2022  -Skip annotating
Out[17]:
(<Figure size 3000x2000 with 4 Axes>, <gwaslab.Log.Log at 0x7fcd214a0a30>)
In [ ]:
Copied!


GWASLab is licensed under the MIT license
Documentation built with MkDocs.

Keyboard Shortcuts

Keys Action
? Open this help
n Next page
p Previous page
s Search